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Regular Hilberg Processes: An Example of 
Processes with a Vanishing Entropy Rate 
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Abstract 

A regular Hilberg process is a stationary process that satisfies both a 
hyperlogarithmic growth of maximal repetition and a power-law growth 
of topological entropy, which are a kind of dual conditions. The hy¬ 
perlogarithmic growth of maximal repetition has been experimentally 
observed for texts in natural language, whereas the power-law growth of 
topological entropy implies a vanishing Shannon entropy rate and thus 
probably does not hold for natural language. In this paper, we provide a 
constructive example of regular Hilberg processes, which we call random 
hierarchical association (RHA) processes. Our construction does not 
apply the standard cutting and stacking method. For the constructed 
RHA processes, we demonstrate that the expected length of any uniquely 
decodable code is orders of magnitude larger than the Shannon block 
entropy of the ergodic component of the RHA process. Our proposition 
supplements the classical result by Shields concerning nonexistence of 
universal redundancy rates. 
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I Main ideas and results 


Throughout this paper we identify stationary processes with their distributions 
(stationary measures) and we use terms “measure” and “process” interchange¬ 
ably. Consider thus a stationary measure /r on the measurable space of infinite 
sequences from a finite alphabet A C N. The random symbols will 

be denoted as : A^ 3 Xi € A, whereas blocks of symbols will be 

denoted as Xka = {xi)\^k- The expectation with respect to fi is denoted as E^. 
We also use shorthand fj,{xi-,m) = = xi-,m)- The Shannon block entropy 

of measure /i is function 


H^(m) := [- log/x(^i:™)], 

and the Shannon entropy rate of /r is the limit 

:= i„f Md = ii,„ 

mGN m m->oo rn 

Let us introduce two functions of an individual block 
the maximal repetition 


( 1 ) 

( 2 ) 

The first one is 


A(^i:fc) := max {to : some xi-m is repeated in (3) 

[ii,asai , whereas the second one is the topological entropy 

Htop{m\^i.,k) log card : xi-.^ is a substring of ^i:^} , (4) 

which is the logarithm of subword complexity In this paper, 

we are interested in the following class of stationary processes, defined using the 
Big O notation: 

Definition 1 (a variation of a definition in (HI) A stationary measure fj, 
on the measurable space of infinite sequences {Afi, is ealled a regular Hilberg 
process with an exponent fi G (0,1) if it satisfies conditions 

A(6:m) = 0 ((logTO)i/^j , (5) 

= 0 (m^) . (6) 

fi-almost surely, where the lower bound for the maximal repetition and the upper 
bound for the topological entropy are uniform in ^i:oo- 

The original definition in 0 uses condition H^{m) = 0 (to^) rather 
than ([0]) and condition E^L(^i;m) = 0 ((logm)^/^) instead of dH]). Condition 
Hptim) = 0 (to^) has been originally contemplated by Hilberg [l^, hence 
follows the name of the class of processes. Conditions (O and (O are, however, 
more natural since they pertain to an individual sequence ^i:oo and are dual in 
view of the following proposition: 

Theorem 1 ([l^) If Htop{m\^i-,k) < log(fc — to -I- 1) then A(Ci:fc) > to. 

Proof: String contains fc — to -|- 1 substrings of length to (on overlapping 
positions). Among them there can be at most ejq){Htop{rn\^i,k)) different sub¬ 
strings. Since exp{Htop(jn\^i-,k)) < k — m + 1, there must be some repeat of 
length TO. Hence A(^i:fe) > to. □ 
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In particular, since Htop(jn\^i-k) < ^ftop(™|Ci:oo), Theorem [T] yields 
Htop{m\ii,oo) = O (m^) => ^ (^(logm)^/^j , 

L{Cl-.m) = O (^(logm)^/^^ => i?top(TO|Cl:oo) = ^ (w^) ■ 

Now we can see that the lower bound in (O is implied by the upper bound in 
((6|) . whereas the upper bound in ([5]) implies the lower bound in ([6]). We might 
therefore suppose that conditions (O and ([0]) hold simultaneously indeed for 
some class of processes. 

Why is this problem important? In fact, according to some experimental 
measurements of maximal repetition, the hyperlogarithmic growth m holds 
approximately with /3 Ri 0.4 for texts in English, French, and German, where 
the lower bound for the growth of maximal repetition seems uniform, i.e., text- 
independent [3, US] • Thus understanding how to construct some class of pro¬ 
cesses satisfying condition @ may contribute to an improvement in statistical 
models of natural language. Although condition Hp{m) = 0 (m^), related to 
©, was actually considered in [3| as a hypothesis for natural language, here we 
should admit that the combination of conditions and ® is likely too strong 
to be required from the natural language models. As we will show, the power 
law implies a vanishing Shannon entropy rate, hp = 0, whereas the over¬ 
whelming empirical evidence asserts that the Shannon entropy rate of natural 
language is strictly positive, about I bit per character [3. flbl f3.13. f3.1^ . 
Nevertheless, constructing stationary processes that satisfy the hyperlogarith¬ 
mic growth @ is nontrivial enough, so it may be illuminating to consider first 
a somewhat unrealistic class of processes that also satisfy the power law ®. 

For the regular Hilberg processes there are two general results. As men¬ 
tioned, it can be seen easily that the power law ® implies a vanishing Shannon 
entropy rate. 

Theorem 2 We have hp = 0 for a regular Hilberg process /i. 


Proof: The argument involves the random ergodic measure F = ^(-IX), where 
I is the shift-invariant algebra 21, By the ergodic theorem for stationary 


processes 


21|, we have /x-almost surely 


Htop{m\fi.,oc,) > log card {xi:m : F{xi:m) > 0} > Hpim), 
so hp = 0 follows from ®, whereas as shown in (3 . [3j| we have 

hfi — P/i hp 1 


(7) 

( 8 ) 


from which hp = 0 follows. □ 

Moreover, the ergodic decomposition of a regular Hilberg process, as defined 
in Definition [1] consists of ergodic regular Hilberg processes. Namely, we have: 


Theorem 3 For a regular Hilberg process p, with exponent /3, the random er¬ 
godic measure F = wherel is the shift-invariant algebra, p-almost surely 

constitutes an ergodic regular Hilberg process with exponent /3. 
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Proof: We have ^ Fdfi. Hence every event of full measure fjL must be 
/r-almost surely an event of full measure F. This implies the claim. □ 


We suppose that the above property is not true for the original definition of 
a regular Hilberg process given in article [ll|, but we do not investigate this 
problem in this paper. 

We will present now some constructive example of regular Hilberg processes. 
The example will be called random hierarchical association (RHA) processes. 
The RHA processes are parameterized by certain free parameters which we will 
call perplexities (a name borrowed from computational linguistics). Approxi¬ 
mately, perplexity is the number of distinct blocks of length 2" that appear 
in the process realization. Exactly in this meaning, term “perplexity” is used in 
computational linguistics. It turns out that controlling perplexities, we can con¬ 
trol the value of the Shannon block entropy and force the Shannon entropy rate 
to be zero. It turns out as well that we can control the value of the topological 
entropy and the maximal repetition. In this way we can construct a stationary 
process exhibiting quite an arbitrary desired growth of the topological entropy 
and the maximal repetition, such a regular Hilberg process. 

We have invented the RH A p rocesses as a construction unrelated to the 
cutting and stacking method [2^, used for constructing stationary processes 
with certain desired properties. The cutting and stacking method seems more 
abstract and more general than the RHA process method. Certainly, these 
two methods adopt very different strategies. The cutting and stacking method, 
being a tool borrowed from ergodic theory, approximates the constructed pro¬ 
cess by an abstract dynamical system. This dynamical system consists of the 
Lebesgue measure on the unit interval with an incrementally constructed par¬ 
tition and transformation. In contrast, the RHA process method begins with 
some nonstationary nonergodic process from which we obtain a given stationary 
ergodic measure by taking the stationary mean and ergodic decomposition. For 
our particular application of constructing regular Hilberg processes, the RHA 
process method is sufficient and seems natural enough but it is likely insufficient 
for constructing processes which satisfy condition ([S]) without condition ([S]) . In 
the later case, being the case of interest for modeling natural language, using 
the cutting and stacking method is a certain idea but we have not figured out 
yet how to implement it exactly. 

To briefly explain our method, the RHA processes are formed in two not 
so complicated steps. First, we sample recursively random pools of kn distinct 
blocks of length 2", which are formed by concatenation of randomly selected kn 
pairs chosen from fc„_i distinct blocks of length 2"“^ sampled in a previous step 
(the recursion stops at blocks of length 1, which are fixed symbols). Second, 
we obtain an infinite sequence of random symbols by concatenating blocks of 
lengths 2°, 2^, 2^, ... randomly chosen from the respective pools. As a result 
there cannot be more that k^ distinct blocks of length 2” that appear the final 
process realization. The selection of these blocks is, however, random and we 
do not know them a priori. This is some reason why the constructed process 
satisfies conditions similar to © and m simultaneously but is nonergodic. 

Now we will write down this construction using symbols. 

Step 1: Formally, let perplexities (fcn)ne{o}uN be some sequence of strictly 
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positive natural numbers that satisfy 

kn-i <kn< kl_^. (9) 

Next, for each n G N, let {L-nj, Rnj)je{i,...,kn} be an independent random com¬ 
bination of kn pairs of numbers from the set {1, kn-i} drawn without repe¬ 
tition. That is, we assume that each pair (L„j, is different, the elements 
of pairs may be identical {Lnj = Rnj), and the sequence {Lnj, Rnj)je{i,...,k„} 
is sorted lexicographically. Formally, we assume that random variables Lnj and 
Rnj are supported on some probability space (il, J, P) and have the uniform 
distribution 


PUkinl 1 Rnl ; ■ • ■ 1 Rnkn ; Rnkn ) — (^nl; ‘^nl Rkn : ^nkn )) 



Subsequently we define random variables 

j e {!,..., fco}, 

i;" = X Yr"-i, j G {1,..., fc„} , n G N, 


( 10 ) 


( 11 ) 

( 12 ) 


where axb denotes concatenation. Hence YJ^ are kn distinct blocks of 2" natural 
numbers, selected by some sort of random hierarchical concatenation. 

Step 2: Variables will be the building blocks of yet another pro¬ 
cess. Let (C'„)„g{o}uN be independent random variables, independent from 
(Lnj, Rnj)nGn,je{i,...,kn}j with Uniform distribution 


P{Cn = j) = 1 /fcn, 


j G {l,...,fc„}. (13) 


Definition 2 The random hierarchical association (RHA) process X with per¬ 
plexities (fcrt)ne{o}uN *5 defined as 

X = yS^x Y^^ X X .... (14) 

This completes the construction of the RHA processes but it is not the end of 
our discussion of these processes. 

It is convenient to define a few more random variables for the RHA process. 
First, sequence X will be parsed into a sequence of numbers Xj, where 


A = Vi X X 2 X X 3 X ..., (15) 

and, second, we denote blocks starting at any position as 

Xk-.i = Xk X Xk+i X X Xi. (16) 

The RHA processes defined in Definition [2] are not stationary but they pos¬ 
sess a stationary mean, which is a condition related to asymptotic mean sta- 
tionarity. Let us introduce shift operation T : >->■ (cci+i)^^^ S 

We recall this definition: 
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Definition 3 A measure v on is called asymptotically mean station¬ 

ary (AMS) if limits 

MA):= lim (17) 

N—yco iV ^' 
i=l 

exist for every event A G M- 

For an AMS measure v, function /i is a stationary measure on (A^, A^), called 
the stationary mean of v. Moreover, measures ^ and v are equal on the shift 
invariant algebra I = {A G : T~^A = A}, i.e., /r(A) = v{A) for all A G X. 

Now, let A+ = A". There is a related relaxed condition of asymptotic 

mean stationarity: 

Definition 4 A measure v on (A^, A«) is called pseudo-asymptotically mean 
stationary (pseudo-AMS) if limits 




lim — 

N—¥oo N 


N 

2=1 


^l:m) 


(18) 


exist for every block xi:m G A+. 

For a pseudo-AMS measure u over a finite alphabet A, function /i, extended via 
= xi-m) ■= l^{xi-.m), is also a Stationary measure on (A^,A”). We shall 
continue to call this fx a stationary mean of v. However, a pseudo-AMS measure 
need not be AMS in general, cf. [l^. Remark in the proof of Lemma 7A6] and 
Example 6.3]. In particular, for a pseudo-AMS measure v we need not have 
p,(A) = J^(A) for shift invariant events A G X. 

It turns out that the RHA processes are pseudo-AMS. 


Theorem 4 The RHA processes are pseudo-AMS. In particular, for m < 2” 
and fc G N, the stationary mean is 


2"-l 






(19) 


The proof of Theorem H] will be presented later in this article. 

We suppose that the RHA processes are also AMS but we could not prove 
it so far. However, we have been able to show that certain RHA processes give 
rise to regular Hilberg processes: 


Theorem 5 For perplexities 

kn = [exp (2^")J , (20) 

where 0 < /3 < 1, the stationary mean p of the RHA process satisfies the follow¬ 
ing conditions: 


(i) The Shannon entropy rate is h^ = Q. 

(ii) The Shannon block entropy is sandwiched by 


Cim 


< Hi^{m) < C2m 


log log m 
log m 


Oi 


( 21 ) 


where a = 1/jd — 1. 
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(Hi) The stationary mean fj, is a regular Hilberg process with exponent (3. 

(iv) The stationary mean p, is nonergodic and the Shannon entropy of the shift 
invariant algebra as defined in is infinite. 

The proof of Theorem [Sj which we consider the main result of this paper, will 
be postponed, as well. Although claim (i) follows from claim (iii) by Theorem 
[21 it will be established using a different method, of an independent interest. 

Theorem |S] has some implications for universal coding. For a uniquely de- 
codable code C, we denote its length for block as |C(^i.m)|- We recall that 
H^fm), so the Shannon block entropy provides a lower bound 
for compression of a stochastic process. In contrast, a code C is called universal 
if 

lim (22) 

m^oo Tfl 

holds almost surely for every stationary ergodic measure /i. Universal codes exist 
and the Lempel-Ziv code [2^ is some example of such a code. The convergence 
rate for universal codes can be arbitrarily slow, however. Shields [2^ showed 
that for any uniquely decodable code C and any sublinear function p{m) = o{m) 
there exists such an ergodic source fi that 


limsup[E^ |C'(^i:m)| - - p{m)] > 0. (23) 


Whereas Shields’ result concerns nonexistence of a universal sublinear bound 
for the difference |C'(^i:m)| — some way of supplementing it is to inves¬ 
tigate ratio |C'(^i:m)| Although this ratio is asymptotically equal to 1 

for universal codes and processes with a positive Shannon entropy rate > 0, 
Shields’ result does not predict how the ratio behaves for processes with a van¬ 
ishing Shannon entropy rate = 0. In fact, for the Lempel-Ziv code and 
ergodic regular Hilberg processes, there is no essentially sublinear bound for the 
ratio |C'(^i:m)| /Hf,{m): 


Theorem 6 Let C be the Lempel-Ziv code. For an ergodic regular Hilberg pro¬ 
cess p with exponent fi, p-almost surely 


H^{m) V(logm)i//3-iy ■ 

Proof: By ergodicity, we have p = F. Thus, by 0 and we obtain 
Hfiim) = Hpirn) < = O (m^) . 


(24) 


(25) 


On the other hand, the length of the Lempel-Ziv code |C(^i:m)| for a block 
by ®, /r-almost surely satisfies 


1 ^( 6 :™)! > 


m 


= n 


m 


log 


m 


A(^l:m) + 1 


(log mfi/P 


-1 


(26) 


The first inequality in (1^^ stems from a simple observation in [llj that the 
length of the Lempel-Ziv code is greater than V log V, where V is the number 
of Lempel-Ziv phrases, whereas the Lempel-Ziv phrases may not be longer than 
the maximal repetition plus 1. □ 
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A somewhat more general result holds for the RHA processes from Theo¬ 
rem [S] In this case, we may replace the Lempel-Ziv code with an arbitrary 
uniquely decodable code: 


Theorem 7 Let C be an arbitrary uniquely decodable code. For the station¬ 
ary mean fj, of the RHA process with perplexities and its random ergodic 
measure F = /i(-|X), we have 


^ Hp{m) \(logTO)^/^“^ 


(27) 


Ratio (HZD can be larger than any function o{m^ 

Proof: The claim follows by o, ®, dm), and the source coding inequality 




□ 

Theorems inland [3 should be read as a warning that the length of a universal code 
|C'(Ci:m)| is not a very reliable estimate of the Shannon block entropy H^(rn) 
for an ergodic regular Hilberg process. Whereas, using a universal code, we can 
reliably estimate the Shannon entropy rate h^, the code length |C'(^i:m)| can 
be orders of magnitude larger than the Shannon block entropy i7^(m). 

The remaining parts of this article are devoted to proving the more involved 
Theorems m and 0 The organization is as follows. In Section |nl some auxiliary 
notations are introduced. In Section EH Theorem 3] is demonstrated. In Sec¬ 
tion [I^ the entropies and the maximal repetition for the RHA process and its 
stationary mean are related. Section fVl concerns some further auxiliary results, 
such as probabilities of no repeat and a bound for the topological entropy. In 
Section EH Shannon block entropies of the RHA processes are discussed. In 
Section IVlTl Theorem [5] is proved. 


II Auxiliary notations 


Let us recall the construction of the RHA process from the previous section. 
In this section we introduce a few notations which will be used further. The 
collection of random variables {Lnj,Rnj) will be denoted as 

Q— (Anj , Rnj )neN,je{l,...,fcn} ■ (^^) 

We will also use notations 


G<m — {Fnj:Rnj')n<mJei{l^...^kn}: 

G>m — {Lnj j Rnj)n>m,jG{l,...,k„} ■ 


(30) 

(31) 


Let us observe that collection Q<m fully determines variables for a fixed m. 

It is convenient to define a few more random variables for the RHA process. 
First, generalizing parsing sequence X will be parsed into a sequence of 
blocks of length 2"’, where 

X = ...X Yff^ (= X") x XJ x AT” x .... (32) 


7 






Let us also observe that there exist unique random variables Knj such that 


X" = 

J n 


( 33 ) 


Moreover, generalizing notation (HH), we also denote blocks of length 2" starting 
at any position as 


XZ, = X X ... X Xr. (34) 

III Stationary mean 

In this section, we will demonstrate Theorem S) This theorem states that 
the RHA process has a stationary mean in a weaker sense, i.e., it is pseudo- 
asymptotically mean stationary (pseudo-AMS). 

First we will prove this useful and a bit surprising property, which will be 
used in the present and in the further sections. 

Proposition 1 Variables Knj are independent from Q<_n and satisfy 

P{Knj = l,Kn,j+i = m) = l/kl, l,m e {1, ...,kn} ,j eN. (35) 

Proof: Each Knj is a function of Cq for some q > n and G>n- Hence Knj are 
independent from G<n- 

Now we will show by induction on j that (13511 is satisfied. 

The induction begins with Kni = Cn and Kn 2 = Lr!,-i-i,c„+i- These two 
variables are independent by definition and by definition Kni is uniformly 
distributed on {1,..., fc„}. It remains to show that so is Kn 2 - Observe that 
(Ln+i,k, Rn+i,k) are independent of Cn+i- Hence for l,m € {!,..., fc„} we ob¬ 
tain 


kn + 1 

— I, KnS — '^) ~ ^ ^ — ^5 Rn-{-l,k — 

k^l 

^ krk + l 

— T ^ ^ — '^) 

k^l 

^ 1 f VV ^ ^ 1 ^n+l ^ 

^n+1 \^n+l/ \^n+l 1/ ^n+1 

so Kn 2 is uniformly distributed on {1,..., fc„}. 

The inductive step is as follows: (i) if Kn+ij is uniformly distributed on 
{1, ...,fc„+i} then {Kn, 2 j,Kn, 2 j+i) = iLn+i,Kr,+i,j,Rn+i,K^+ij) IS Uniformly 
distributed on {!,..., fc„} x {!,..., fc„}, and (ii) if (Kn+ij, Kn+ij+i) is uni¬ 
formly distributed on {!,..., fc„+i} x {!,..., then {Kn, 2 j+i, Kn, 2 j+ 2 ) = 

{Rn+i,K„+i,j,Ln+i,Kn+i,j+i) IS Uniformly distributed on {1,..., x {1,..., fc„}. 
Now observe that {Ln+i,k, Rn+i,k) are independent of Kn+ij- Hence, for 





l,m G {1, kn} we obtain 


P{Kn,2j = l,Kn^2j+l = m) 

^n + 1 

— ^ ^ -P(-^n+l,fc — ^') Rn-\-l,k — ^)-P(^n+lj — 

^ fcn + l 

— T ^ ^ -P('^n+1,A; — ^j-^n+l,fc — 

^^+1 k^l 

^ 1 / VV - 1 \ 1 fen+1 ^ ^ 

^n+1 \^n+l/ \^n+l 1/ ^n+1 

which proves claim (i). On the other hand, for l^m G {1,..., kn} we obtain 




— ^ ^ ^{Rn~\-l,p — ^n+1 j'+l — 


p,q^l 


^ kn + l 


r.2 _ 

p,q— 1 


^n+1 


^ ^ -P(-^n+l,p — ^n+l,p — 


p=l 


+ 


1 

T2 ^ ^ ^(-^71+1,p — -^n+l,g — 

p,<7—1, p^q 


1 


kl_ 


n+l \^n+l 


-1 


^n+1 1 


+ {kl - 1) 


^n+l 


^n+1 _ j^^^n+l(^n+l 1) 


1.2 

•^n 


kl{kl - 1) 


kl-2 

kn+l 2 

1 

" ZS" ’ 


which proves claim (ii). □ 


Using Proposition [H it is easy to demonstrate Theorem |4l 

Proof of Theorem[4j Block Xfe 2 '‘+j:fe 2 "+j+m-i is a subsequence of for 

TO < 2”, k G N, and 0 < j < 2". In particular, there exist functions fmj such 
that 

Xk2”'+j:k2”'+j+m-l = /mj(^fc:fe+l)- 

Hence probabilities P{Xi.i^rn-i = xi-,m) are periodic functions of i with period 
2", by Proposition [T] This implies the formula for /r(xi:m)- D 


IV Bounds for the stationary mean 

This sections opens the discussion of various auxiliary results necessary to estab¬ 
lish Theorem [SI the main result of this paper. The theorem operates with three 
functions of the stationary mean of the RHA process: Shannon block entropy, 
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maximal repetition, and topological entropy. We first observe that it may be 
easier to analyze the behavior of blocks X" drawn from the original the RHA 
process than the behavior of its stationary mean. For this reason, in this section 
we want to derive some bounds for the entropies and the maximal repetition of 
the stationary mean from the analogical bounds for blocks A". In the following 
we will denote 


^kj — -^fe2"+j:fe2’»+j+2"-l- (36) 

In particular, we have = A^. 

Subsequently, for Shannon entropy H{X) = Ep [—logP(A)], we obtain: 

Proposition 2 For the stationary mean ^ of the RHA process, we have 

H{XJ-^) < < H{X^+^) + nlog2. (37) 

Proof: By the Jensen inequality for function p i——plogp and Theorem 01 we 
hence obtain 


1 2"-i 

77^(2”) > ^ E )■ (38) 

Now we observe that for each fc > 1 and j there exists a q such that A”“^ is a 
subsequence of X^-. Thus we have H{X‘^-) > H{XJf~^). This combined with 
inequality ((55]) yields i7(A"“^) < iJ^(2"’). On the other hand, using inequality 
p{xi-, 2 ”-) > 2~^P{Xfj = xi: 2 ") and Theorem [4l we obtain 


1 2"-i 

77^(2") < ^ E ^(^o) + ^log2. (39) 

i=o 

Now we observe that for each k > 1 and j there exists a q such that A^^ is a 
subsequence of A”+^. Thus we have H{XJtj) < 7J(Ag+^). This combined with 
inequality (1551) yields 77^(2") < iJ(A"^^) +nlog2. □ 

Analogically, we can bound the maximal repetition of the stationary mean. 
The result will be stated more generally. We will say that a function (j) : A+ —R 
is increasing if for u being a subsequence of w, we have 4>{u) < 4>{w). Examples 
of increasing functions include the maximal repetition L{w), the topological en¬ 
tropy Htopirn\w), and the indicator function l{^(w) > fc}, where </> is increasing. 


Proposition 3 For the stationary mean p, of the RHA process and an increas¬ 
ing function (j), we have 

Ep </>(A”-i) < E^ </)(a:2") < Ep </)(A”+i). (40) 

Proof: By Theorem 01 

2"-l 

E^</)(a:2^) = ^ E (41) 

3=0 


10 


Now we observe that for each fc > 1 and j there exists a q such that ^ 
is a subsequence of Thus we have (j){X]^-) > This combined 

with equality (liTl) yields (j){X'^~^) < E^0(^i:2n). On the other hand, for 
each fc > 1 and j there exists a q such that X^^ is a subsequence of 
Thus we have (j){XJ}j) < (j){X^~^^). This combined with equality (ITT]) yields 
E^0(ei:2^) <Ep</.(X”+l). □ 

Hence, to obtain the desired bounds for the stationary mean, it suffices to 
investigate the distribution of blocks X^. 

V Further auxiliary results 

To make another observation, Theorem[S]links the Shannon block entropy, max¬ 
imal repetition and topological entropy of the RHA process with its parameters 
called perplexities Therefore, the goal of this section is to furnish some 
bounds for topological entropy and maximal repetition of blocks XJ}j in terms 
of perplexities A:„. In contrast, in the next section we will use perplexities kn to 
bound the Shannon entropies of blocks XJ^j. 

Let us begin with a simple lower bound for the topological entropy of blocks 
A". From this bound we can then obtain an upper bound for the maximal 
repetition by Theorem [TJ 

Proposition 4 For the RHA process, almost surely 

<21ogfc,^. (42) 

Proof: For a given realization of the RHA process (i.e., for fixed F,™), there 
are at most km different values of blocks A™. Therefore, there are at most km 
different values of blocks XJ^ in sequence X. □ 

Obtaining a lower bound for the topological entropy and an upper bound 
for the maximal repetition of blocks A" is more involved. These topics will 
be discussed in the following sections. For this goal, we will consider events 
An^-i := 0 and 


Anm '■= (Xi consists of 2" distinct blocks A™) (43) 

We have P{Ann) = 1 and Anm 3 An^m-i- Probabilities P{Anm) will be called 
probabilities of no repeat. 

Proposition 5 For the RHA process, we have P{Anm) = 0 for km < 2"“™, 
whereas for km > and m < n we have 


P{Anm) — P{An,m+l) 


km{km -l)...{km- 2"“™ + 1) 

fc^(fc^-l)...(fc^-2"-™-i + l)- 


(44) 


Proof: There are no more than km distinct blocks A™ in block A". Thus 
P{Anm) = 0 for km < 2"“™. Now assume km > 2"“”^. Introduce random 
variables Dmi such that A” = ^ ••• ^ Consider probabilities 


II 



Pm = P[Dmi = di, = d 2 n-m), where di are distinct. It can be easily 

shown by induction on decreasing m that Pm do not depend on di and satisfy 



Moreover, since Pm do not depend on di, we obtain P{Anm) = Pmkm(km — 
1)... {km — 2"“™ + 1). Hence the claim follows. □ 

VI Shannon block entropy 

This section is the last preparatory section. Here we will bound the Shannon 
entropies of blocks in terms of perplexities fc„. To establish some necessary 
notation, for random variables X, Y and Z, where X is discrete whereas Y 
and Z need not be so, besides Shannon entropy H{X) = Ep [—logP(X)], we 
define conditional entropy H{X\Y) = Ep [—logP(X|F)], mutual information 
I{X;Y) := H{X) — H{X\Y), and conditional mutual information J(X; F|Z) := 
H{X\Z) — H{X\Y, Z). Given these objects, we will bound the Shannon entropies 
of blocks of the RHA process. 

The first result is a corollary of Proposition [U which says that conditional 
entropy of blocks X" given the entire pool of admissible blocks of the same 
length Q<n is exactly equal to the logarithm of perplexity. 

Proposition 6 We have 


iJ(X”|^<„) =logfc„ 


(45) 


and =0. 

Proof: Given G<n, the correspondence between X" and Knj is one-to- 
one. Hence i?(X"|t/<„) = H{Knj\G<n)- From Proposition [T] we fur¬ 
ther obtain H{K„j\G<n) = H{Knj) = logfc„ and H{Knj, Knj+i\G<n) = 

The second result is an exact expression for the Shannon entropy of the pool 
of admissible blocks G<n, also in term of perplexities. 

Proposition 7 We have 



(46) 


Proof: The claim follows by chain rule H{G<n) = H{G<n-i) + Pd{G<n\G<n-i) 
from H{G<o) = 0 and H{G<n\G<n-i) = log □ 


Combining the above two results, we can provide an upper bound for the 
unconditional Shannon entropy of blocks X". 

Proposition 8 We have 


i7(X”) < min {H{G<i) + 2"-'logfci) . 


0<Z<n 


(47) 
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Proof: For any 0 < I < n we have H{X^) < H{X^,Q<i) = H{X^\Q<i) + 
H{g<i), whereas H{X;^\g<i) < 2^-^H{Ki,\g<i) = 2^-^H{Kij) = 2"-'logfci. □ 

Given Propositions [6] and [H we may introduce an important parameter of 
the RHA process, which we will call the combinatorial entropy rate. 

Definition 5 The combinatorial entropy rate of the RHA process is 

h := ini 2~’'log ki = lim log/c;. (48) 

i6N i->oo 

Proposition 9 We have 


inf 2-"7J(X;) = A (49) 

Proof: On the one hand, by Proposition [SJ 

inf 2-^H(X^) > inf 2-^H(Xm<n) = inf 2“'log fc;. 

neN new J ' - ' 

On the other hand, by Proposition [51 

inf 2-”7J(X”) < inf inf (2-^H(g<i) + 2-‘log kA = inf 2"'log fc;. 

nGN ZeNneN^ ' ZGN 

□ 


Proposition [9] combined with Proposition yields a bound for the Shannon 
entropy rate of the stationary mean of the RHA process. 

Proposition 10 For the stationary mean /i of the RHA process, we have 

/i/2 <hf,< 2h. (50) 

Proof: Divide inequality (1571) by 2" and take the infimum. □ 

In particular, the combinatorial entropy rate vanishes {h = 0) if and only if the 
Shannon entropy rate of the stationary mean vanishes (h^ = 0) as well. This 
happens in particular for perplexities (EOl). 

Inequality H{Xf) > H{Xf\g<n) = logA:„ gives a certain lower bound for 
the Shannon block entropy of the RHA process. For perplexities (l20l) . this lower 
bound is orders of magnitude smaller than the upper bound (1471) . Concluding 
this section we would like to produce a lower bound which is of comparable 
order to (HZD- 

Proposition 11 We have 

"X) s (>»s (to - '“s (t 

where P{Ani) are the probabilities of no repeat 0- 
Proof: We have 


77(x;) > /(x;;e;<i|e<z-i) = H{g<i\g<i_i) - H{g<i\g<i_ux;;). 
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We have = log As for X”), we may propose 

the following bound. Given X" consisting of 2"“* distinct blocks of length 2\ 
tuple {Lij, assume at most ) distinct values. Hence 

/^2 _ 2^- 

< p(H„oiog 

from which the claim follows. □ 


VII Main result 


Now we can demonstrate the main result, which will conclude our paper. 

Proof of Theorem [5} 

(i) For perplexities o the combinatorial entropy rate is h = 0. Hence 

= 0 by Proposition fTIll 

(ii) By (im . entropy H{g<n) can be bounded as 


” /^2 \ " 

H{g<n) = 2kilogki-i < 2nknlogkn- 

1=1 Vi/ 

Hence, from (14711 . for 0 < I < n we obtain an upper bound: 


i7(X;) < (2;A:/ + 2"-')logA:z. 


If we choose I = 


i7(xn < 


f3 ^ log2 ^ ”og ) j then for perplexities ((^ we obtain 


2r-2"/.o<n ^ 

V ^ogn J V ^ogn J 


-1//3- 


nlog2 

logn 


= 0 2 " 


logn 


1//3-1N 


(52) 


On the other hand, from (ED and (EH), for 0 < I < n we have 


H{Xf) > log 


h 


-log 


kf_^ - 2"-' 

ki - 2"-' 


P{A. 


nl) 


> 2”“' log 


fcf_i -2”-' + l\ 
ki - 2"-' + 1 j 


P(A„0, 
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where 


,, . fc^(fc^-l)...(fc^-2»—+ 1) 

^ J[U^(fc^-l)...(fc^-2—1 + 1) 

> TT ( {km - 2"-”^ + 2)(fc^ - 2"-”^ + 1) 
“ V kl,- 2"-™-i + 1 


2” 


^ / (fc; - 2"-^ + 2)(/c; - 2"-^ + 1) \ ‘ " 

“ V fcf - 2”-'-i + 1 y 

fci(2”-'+i-3) + 2y” 

- V kf- 2’^-'-! + 1 ) 

ki{r-i+i_^^ 

kf - 2"'-'-i + 1 


(53) 


If we choose I = \P~^ log2(2n)] then for perplexities (|2()|1 we obtain that 
ki > exp(2n) > 2^". Hence P{Ani) is greater than a certain constant 
a > 0 and 




l]2n = 0 




(54) 


By ((5^ and (I5H) . from Proposition [21 we obtain the desired sandwich 
bound for the entropy of the stationary mean. 

(iii) By Proposition 0] and Proposition [3] we obtain 

Q = ^pl{Htop{2'^\X) > 2\ogkm} 

> Ep l{i?top(2™|X;+i) > 2\ogkm) 

> E^l{i?top( 2 ’"|ei: 2 ^) > 21og/c™}. 

Hence /x-almost surely Htop(2’”|^i:oo) < 21ogfcm = 2^™+!, which implies 
the upper bound Htop{m\^i-,oc) < Cim^ for a certain constant Ci. From 
this we obtain the lower bound L(^i:m) > C' 2 (logTO)i/^ by TheoremjT] 

As for the converse bounds, we have L{Xi) > 2* for where are 
the events of no repeat (l43ll . Hence by Proposition O 

Ep l{A(ei:2") > Z} < Ep l{L(xy+i) > 7} < 1 - P(A„+i,0. 

Now, if we choose I = \l3~^ log2(2n)] then for perplexities (13(1 we obtain 
that ki > exp(2n) > 2^". Hence, by ([531), “ P{Xn+i,i)) < oo. 

Consequently, by the Borel-Cantelli lemma A(^i: 2 ") < I must hold for 
sufficiently large n /r-almost surely. Thus T(fi:m) < C' 3 (logTO)i/^ for 
sufficiently large m. From this we obtain the lower bound Htop(m|^i:oo) > 
for sufficiently large m by TheoremjT] 

(iv) Denote the random ergodic measure F = /r(-|T) of the stationary mean 
/r. The entropy of the shift-invariant algebra with respect to /r may be 
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bounded by mutual information as 


H^{X)= lim 4(I;a:m)= lim 

m—¥oo m—¥oo 

= lim [i/^(TO) - i?F(w)] 

m—¥oo 

= lim - Epi7top(TO|^i:oo)] = oo. 


Since the entropy of the shift-invariant algebra is strictly positive, the 
measure ^ is nonergodic. 

□ 
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